Topic modeling Communities of Practice to Identify Learning Barriers

X-DBER 2023

Tim Ransom

who I am

  • Computer Scientist
  • Mathematician
  • English Speaker
  • Clemson ESED graduate Student
  • Cat person

photograph of Tim

Tim's cat Ada

Uses of NLP scraping in classrooms

  • determine which topics students are asking for help online
    • (assumption) highly requested help topics indicate difficult to learn topics
  • determine which topics students can find answers for online
    • (assumption) students are searching for help on Reddit
  • determine which topics students might be most exposed to online
  • identifying barriers to learning

What is Reddit

  • BBS-system pseudo-anonymous social media
    • (Basically image/text forums with accounts)
  • Organized into subreddits around topics
Community Learning Community
r/Python r/learnpython
r/math r/learnmath
r/engineering r/EngineeringStudents
r/Physics r/learnPhysics

Conceptualizing subreddits as communities of practice

  • people → subscribed users
  • practice → topic of the subreddit
  • culture → internet & domain

Latent Drilicht Allocation

  • LDA [1] is a well established NLP topic modelling algorithm
  • Documents are comprised of a mixture of terms and topics
  • With sufficiently many documents we can correlate the relationship between terms and topics

We’ll be topic modelling Reddit post data

Illuminatory Example

Subreddit CoP Topics can Overlap

Python wordclouds

learnpython wordclouds

Single Subreddit Topics

r/python

r/learnpython

r/learnpython over time

Identified Learning Barriers

Next Steps

  • Interpret other subreddits
  • Interpret other disciplines

References

[1]
D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,” Journal of machine Learning research, vol. 3, no. Jan, pp. 993–1022, 2003.

Contact info

email: tsranso@clemson.edu

github: https://github.com/ransomts

website: tsranso.people.clemson.edu